Options
- Filters: how to use them
Here you can find informations on filters: how to accept all gif files in a mirror, for example
- List of options
w mirror with automatic wizard
This is the default scanning option, the engine automatically scans links according to the default options, and filters defined. It does not prompt a message when a "foreign" link is reached.
W semi-automatic mirror with help-wizard (asks questions)
This option lets the engine ask the user if a link must be mirrored or not, when a new web has been found.
g just get files (saved in the current directory)
This option forces the engine not to scan the files indicated - i.e. the engine only gets the files indicated.
i continue an interrupted mirror using the cache
This option indicates to the engine that a mirror must be updated or continued.
rN recurse get with limited link depth of N
This option sets the maximum recurse level. Default is infinite (the engine "knows" that it should not go out of current domain)
a stay on the same address
This is the default primary scanning option, the engine does not go out of domains without permissions (filters, for example)
d stay on the same principal domain
This option lets the engine go on all sites that exist on the same principal domain.
Example: a link located at www.someweb.com that goes to members.someweb.com will be followed.
l stay on the same location (.com, etc.)
This option lets the engine go on all sites that exist on the same location.
Example: a link located at www.someweb.com that goes to www.anyotherweb.com will be followed.
Warning: this is a potentially dangerous option, limit the recurse depth with r option.
e go everywhere on the web
This option lets the engine go on any sites.
Example: a link located at www.someweb.com that goes to www.anyotherweb.org will be followed.
Warning: this is a potentially dangerous option, limit the recurse depth with r option.
n get non-html files 'near' an html file (ex: an image located outside)
This option lets the engine catch all files that have references on a page, but that exist outside the web site.
Example: List of ZIP files links on a page.
t test all URLs (even forbidden ones)
This option lets the engine test all links that are not caught.
Example: to test broken links in a site
x replace external html links by error pages
This option tells the engine to rewrite all links not taken into warning pages.
Example: to browse offline a site, and to warn people that they must be online if they click to external links.
sN follow robots.txt and meta robots tags
This option sets the way the engine treats "robots.txt" files. This file is often set by webmasters to avoir cgi-bin directories, or other irrevelant pages.
Values:
s0 Do not take robots.txt rules
s1 Follow rules, if compatible with internal filters
s2 Always follow site's rules
bN accept cookies in cookies.txt
This option activates or unactivates the cookie
b0 do not accept cookies
b1 accept cookies
S stay on the same directory
This option asks the engine to stay on the same folder level.
Example: A link in /index.html that points to /sub/other.html will not be followed
D can only go down into subdirs
This is the default option, the engine can go everywhere on the same directoy, or in lower structures
U can only go to upper directories
This option asks the engine to stay on the same folder level or in upper structures
B can both go up&down into the directory structure
This option lets the engine to go in any directory level
Y mirror ALL links located in the first level pages (mirror links)
This option is activated for the links typed in the command line
Example: if you have a list of web sites in www.asitelist.com/index.html, then all these sites will be mirrored
NN name conversion type (0 *original structure 1,2,3 html/data in one directory)
N0 Site-structure (default)
N1 Html in web/, images/other files in web/images/
N2 Html in web/html, images/other in web/images
N3 Html in web/, images/other in web/
N4 Html in web/, images/other in web/xxx, where xxx is the file extension (all gif will be placed onto web/gif, for example)
N5 Images/other in web/xxx and Html in web/html
N99 All files in web/, with random names (gadget !)
N100 Site-structure, without www.domain.xxx/
N101 Identical to N1 exept that "web" is replaced by the site's name
N102 Identical to N2 exept that "web" is replaced by the site's name
N103 Identical to N3 exept that "web" is replaced by the site's name
N104 Identical to N4 exept that "web" is replaced by the site's name
N105 Identical to N5 exept that "web" is replaced by the site's name
N199 Identical to N99 exept that "web" is replaced by the site's name
N1001 Identical to N1 exept that there is no "web" directory
N1002 Identical to N2 exept that there is no "web" directory
N1003 Identical to N3 exept that there is no "web" directory (option set for g option)
N1004 Identical to N4 exept that there is no "web" directory
N1005 Identical to N5 exept that there is no "web" directory
N1099 Identical to N99 exept that there is no "web" directory
LN long names
L0 Filenames and directory names are limited to 8 characters + 3 for extension
L1 No restrictions (default)
K keep original links (e.g. http://www.adr/link) (K0 *relative link)
This option has only been kept for compatibility reasons
pN priority mode:
p0 just scan, don't save anything (for checking links)
p1 save only html files
p2 save only non html files
p3 save all files
p7 get html files before, then treat other files
cN number of multiple connections (*c8)
Set the numer of multiple simultaneous connections
O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles])
This option define the path for mirror and log files
Example: -P "/user/webs","/user/logs"
P proxy use (-P proxy:port or -P user:pass@proxy:port)
This option define the proxy used in this mirror
Example: -P proxy.myhost.com:8080
F user-agent field (-F \"user-agent name\
This option define the user-agent field
Example: -F "Mozilla/4.5 (compatible; HTTrack 1.2x; Windows 98)"
mN maximum file length for a non-html file
This option define the maximum size for non-html files
Example: -m100000
mN,N' for non html (N) and html (N')
This option define the maximum size for non-html files and html-files
Example: -m100000,250000
MN maximum overall size that can be uploaded/scanned
This option define the maximum amount of bytes that can be downloaded
Example: -M1000000
EN maximum mirror time in seconds (60=1 minute, 3600=1 hour)
This option define the maximum time that the mirror can last
Example: -E3600
AN maximum transfer rate in bytes/seconds (1000=1kb/s max)
This option define the maximum transfer rate
Example: -A2000
GN pause transfer if N bytes reached, and wait until lock file is deleted
This option asks the engine to pause every time N bytes have been transfered, and restarts when the lock file "hts-pause.lock" is being deleted
Example: -G20000000
u check document type if unknown (cgi,asp..)
This option define the way the engine checks the file type
u0 do not check
u1 check but /
u2 check always
RN number of retries, in case of timeout or non-fatal errors (*R0)
This option sets the maximum number of tries that can be processed for a file
o *generate output html file in case of error (404..) (o0 don't generate)
This option define whether the engine has to generate html output file or not if an error occured
TN timeout, number of seconds after a non-responding link is shutdown
This option define the timeout
Example: -T120
JN traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link
This option define the minimum transfer rate
Example: -J200
HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow
This option define whether the engine has to abandon a host if a timeout/"too slow" error occured
&P extended parsing, attempt to parse all links (even in unknown tags or Javascript)
This option activates the extended parsing, that attempt to find links in unknown Html code/javascript
j *parse Java Classes (j0 don't parse)
This option define whether the engine has to parse java files or not to catch included files
I *make an index (I0 don't make)
This option define whether the engine has to generate an index.html on the top directory
X *delete old files after update (X0 keep delete)
This option define whether the engine has to delete locally, after an update, files that have been deleted in the remote mirror, or that have been excluded
C *create/use a cache for updates and retries (C0 no cache)
This option define whether the engine has to generate a cache for retries and updates or not
k store all files in cache (not useful if files on disk)
This option define whether the engine has to store all files in cache or not
V execute system command after each files ($0 is the filename: -V \"rm \\$0\
This option lets the engine execute a command for each file saved on disk
q quiet mode (no questions)
Do not ask questions (for example, for confirm an option)
Q log quiet mode (no log)
Do not generate log files
v verbose screen mode
Log files are printed in the screen
f *log file mode
Log files are generated into two log files
z extra infos log
Add more informations on log files
Z debug log
Add debug informations on log files
--mirror *make a mirror of site(s)
--get get the files indicated, do not seek other URLs
--mirrorlinks test links in pages (identical to -Y)
--testlinks test links in pages
--spider spider site(s), to test links (reports Errors & Warnings)
--update update a mirror, without confirmation
--skeleton make a mirror, but gets only html files
--http10 force http/1.0 requests when possible
|
|
|